TST[string]: update expecteds for using_string_dtype to fix xfails #61727

jbrockmendel · 2025-06-27T22:59:38Z

It isn't 100% obvious that the new repr for Categoricals is an improvement, but it's non-crazy. One of the remaining xfails one is for eval(repr(categorical_index)) round-tripping that won't be fixable unless we revert back to the old repr behavior.

I'm pretty sure that the fix in test_astype_dt64_to_string is correct and the test is just wrong, but merits a close look.

That leaves 12 xfails, including the one un-fixable round-trip one that we'll just remove. Of those...

test_join_on_key i think is surfacing an unrelated bug that I'll take a look at
test_to_dict_of_blocks_item_cache is failing because we don't make series.values read-only for ArrowStringArray. I think @mroeschke can comment on how viable/important that is.
test_string_categorical_index_repr is about CategoricalIndex repr that span multiple lines; with the StringDtype the padding is changed.
4 in pandas/tests/io/json/test_pandas.py that im hoping @WillAyd can take point on
test_to_string_index_with_nan theres a MultiIndex level that reprs with a nan instead of NaN. Not a huge deal but having mixed-and-matched nans/NaNs in the repr is weird.
test_from_records_sequencelike i don't have a good read on
tests.base.test_misc::test_memory_usage is skipped instead of xfailed, but the reason says that it "doesn't work properly" for arrow strings which seems xfail-adjacent. Instead of skipping can we update the expected behavior cc @jorisvandenbossche ?

(Update: looks like I missed one in test_http_headers and another in test_fsspec)

WillAyd · 2025-06-30T13:05:15Z

The JSON issues stem back to the fact that:

>>> pd.Series([None, '', 'c']).astype(object)

yields different behavior with/without the future string dtype. In the "old" world, this would preserve the value of None but in the new world, None gets cast to a missing value indicator when contained within a series of string values.

In theory we could try and work around those semantics by natively supporting an object type in the JSON reader, but that's a ton of effort and I don't think worth it, given JSON does not natively support object storage

jbrockmendel · 2025-06-30T15:25:23Z

thanks, will update those tests' expecteds

TST: update expecteds for using_string_dtype to fix xfails

91dfbee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

jbrockmendel commented Jun 27, 2025

Uh oh!

WillAyd commented Jun 30, 2025

Uh oh!

jbrockmendel commented Jun 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

Are you sure you want to change the base?

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

Conversation

jbrockmendel commented Jun 27, 2025

Uh oh!

WillAyd commented Jun 30, 2025

Uh oh!

jbrockmendel commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jbrockmendel commented Jun 30, 2025 •

edited

Loading